Goto

Collaborating Authors

 cost effective active search



Cost Effective Active Search

Neural Information Processing Systems

We study a special paradigm of active learning, called cost effective active search, where the goal is to find a given number of positive points from a large unlabeled pool with minimum labeling cost. Most existing methods solve this problem heuristically, and few theoretical results have been established. We adopt a principled Bayesian approach for the first time. We first derive the Bayesian optimal policy and establish a strong hardness result: the optimal policy is hard to approximate, with the best-possible approximation ratio lower bounded by $\Omega(n^{0.16})$. We then propose an efficient and nonmyopic policy using the negative Poisson binomial distribution. We propose simple and fast approximations for computing its expectation, which serves as an essential role in our proposed policy. We conduct comprehensive experiments on various domains such as drug and materials discovery, and demonstrate that our proposed search procedure is superior to the widely used greedy baseline.



Reviews: Cost Effective Active Search

Neural Information Processing Systems

The paper considers a Bayesian decision theoretic formulation of the problem of minimizing the number of queries to identify the desired number of positive instances (instances with positive labels), given a probabilistic model of the labels in the dataset. This formulation is motivated by the material and drug discovery problems. The problem is properly formulated and contrasted with the recently suggested budgeted-learning setting, where the goal is to identify the largest number of positive instances given a fixed budget on queries. Further the authors show that the optimal Bayesian policy is hard to compute and hard to approximate. However, further assuming certain conditional independence the policy can be approximated efficiently using the negative-poisson-binomial distribution, for which the authors propose computationally-cheap expectation estimates.The resulting policy is compared to several other alternatives, and it is shown to obtain overall superior performance in both material discovery and drug discovery datasets.


Cost Effective Active Search

Neural Information Processing Systems

We study a special paradigm of active learning, called cost effective active search, where the goal is to find a given number of positive points from a large unlabeled pool with minimum labeling cost. Most existing methods solve this problem heuristically, and few theoretical results have been established. We adopt a principled Bayesian approach for the first time. We first derive the Bayesian optimal policy and establish a strong hardness result: the optimal policy is hard to approximate, with the best-possible approximation ratio lower bounded by \Omega(n {0.16}) . We then propose an efficient and nonmyopic policy using the negative Poisson binomial distribution.


Cost Effective Active Search

Neural Information Processing Systems

We study a special paradigm of active learning, called cost effective active search, where the goal is to find a given number of positive points from a large unlabeled pool with minimum labeling cost. Most existing methods solve this problem heuristically, and few theoretical results have been established. We adopt a principled Bayesian approach for the first time. We first derive the Bayesian optimal policy and establish a strong hardness result: the optimal policy is hard to approximate, with the best-possible approximation ratio lower bounded by $\Omega(n {0.16})$. We then propose an efficient and nonmyopic policy using the negative Poisson binomial distribution.